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Abstract 



We investigate Liischer's method of including dynamical Wilson fermions in a lattice sim- 
ulation of QCD with two quark flavours. We measure the accuracy of the approximation 
by comparing it with Hybrid Monte Carlo results for gauge plaquette and Wilson loops. 
We also introduce an additional global Metropolis step in the update. We show that the 
complexity of Liischer's algorithm compares favourably with that of the Hybrid Monte 
Carlo. 
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1 Introduction 



Taking into account the effects of sea-quarks in lattice QCD simulations presents serious 
technical difficulties. The main problem is the excessive cost in computer time due to 
the need of evaluating the fermion matrix determinant for each gauge field update step. 
The most used algorithm, Hybrid Monte Carlo (HMC) fij, encounters dramatic problems 
of CPU time costs and slowing down in the limit of small quark masses, a fact that has 
generally forced people to use the valence (quenched) approximation in which the fermion 
determinant is assumed to be unity. For many observables the results obtained in this 
approximation fit quite well with phenomenologically known properties of QCD, but for 
other observables the quenched approximation is definitely too crude (for example a s ). 

The urgent need to be able to perform calculations in full QCD has motivated substan- 
tial activity to overcome the technical problems caused by the full dynamical treatment of 
the fermions. The search for a less expensive algorithm for lattice sea-quarks simulation 
is therefore a very actual subject, on which recently a new proposal has been made by M. 
Liischer [Q]. The new method consists in approximating the inverse of the fermion matrix 
with a polynomial, and then interpreting the determinant as the partition function of a 
local bosonic system. Simulating this bosonic system will enable us to evaluate the effects 
of the sea-quarks. 

This algorithm is quite general, and its application to various systems has been and is 
being studied, with the aim of optimizing it and rendering it economically viable. In this 
paper, we will report the results of a first exploratory application of Liischer's algorithm 
to an SU(3) Lagrangian with two degenerate flavours of Wilson quarks. The observables 
measured are the gauge plaquette and Wilson loops, for which we study the autocorrelation 
time and precision of the approximation for numerous sets of parameters. We also show 
the effect of inserting a global Metropolis step in the update, as suggested in reference Q, 
with the aim of making the algorithm more precise and more efficient. 



2 The algorithm 

The partition function of lattice QCD with two fermion flavours can be written as 

Z = J [dU}detQ 2 e~ SG ^ (1) 
where Syf/] is the pure gauge action and the fermion matrix Q has the form: 

Q=^ D + m \ wi th M= 8 -+m. (2) 

cm M a 

Here a is the lattice spacing, and Q has been chosen hermitian, and with eigenvalues con- 
tained in the interval (—1, 1). The constant cm > 1 is fixed to the values 1.1 as explained 
inref. g. 

As anticipated, the first step is approximating 1/Q 2 with a polynomial P(Q 2 ). In 
choosing the form of P(s), we follow Q, where its explicit expression in terms of Chebyshev 
polynomials is given. 
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P(s) is of even degree n, and approximates the function 1/s in an interval e < s < 1, 
where e is a parameter to be chosen according to the range of eigenvalues of Q 2 ; the n 
roots of P(s), Zk, lie on an ellipse in the complex plane with foci e and 1, and minor axis of 
length 2y/e; they come in complex conjugate pairs, with Im Zf. ^ 0. Since the roots come 
in complex conjugate pairs, P(s) can be written in the form: 

n n 

P(s) = const x Y[(s - z k ) = const x - V^k)(Vs - v 7 ^) • ( 3 ) 

i=\ i=l 

A measure of the relative error of the approximation is given by R(s) = P(s)s — 1 , which 
in the interval e < s < 1 is bounded by 

|fl( a )|<2^^ = S(n,e). (4) 

When s < e the polynomial continues to converge, but with an s-dependent exponential 
rate approaching zero in the s — * limit. 

We can now introduce a set of boson fields 0& (k = 1, ...n) with partition function Z&, 
approximating the fermion determinant 

r n t _ 1 

Z b [U] = j J{[dM[d$l]e-^ Q -^ {Q -^ k = detpm - det Q 2 ■ (5) 

The partition function of full QCD is then approximated by the local bosonic action 

Z = J [dU] detQ 2 e~ SG ^ ~ / [dU] Z b [U] e~ s ^ . (6) 

Making use of this action, we may now simulate the fermion determinant by locally up- 
dating the boson fields U and (j), using algorithms like heat-bath and over-relaxation. The 
implementation of the program is described in Appendix A. 



3 Numerical simulations on the 4 4 lattice for the pure Liischer 
algorithm 

The numerical simulations of the theory described by dq) were performed on a 4 4 lattice 
for various choices of the parameters e and n, at /3 = 6.0, for k = 0.12 and 0.14 (in the 
deconfined phase), and at (3 = 5.0 for k = 0.15 (in the confined phase). 

For each set of parameters, an average of 10,000 sweeps were performed, each sweep 
consisting of one (ft heat-bath, one eft over-relaxation, and one to six U over-relaxations (see 
further on for discussion), ergodicity being ensured by the 4> heat-bath. After each sweep 
the gauge boson plaquette was measured. 

For all k values, the minimum eigenvalue of Q 2 was measured for approximately 10,000 
configurations. The average results are: 
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The averages of the lowest eigenvalues are shown in Fig. 1 for k = 0.12 and (3 = 6. As 
a check, measurements of the plaquette with different numbers of U field over-relaxation 
steps have been done, monitoring the autocorrelation time: no variation was observed, 
indicating that the autocorrelation time was dominated by the dynamics of the <j> fields. 
Consequently we chose to perform in subsequent runs only one U over-relaxation per sweep. 

For comparison, the plaquette was measured for the same lattice parameters and k val- 
ues also using a HMC program. The results of all the simulations are reported in Table 1. 
In Fig. 2 the plaquette values are plotted as a function of 5(n, e). 

As can be seen from Table 1, many values of e chosen are quite high with respect to the 
lowest Q 2 eigenvalue; despite that, the agreement with the HMC results remains good, on 
this 4 4 lattice, up to e values ten times larger than the lowest Q 2 eigenvalue. This shows 
that the first few low-energy modes of the fermion determinant do not couple strongly to 
the plaquette. On larger lattices where non-local observables can be measured better, one 
would expect that e will have to be chosen closer to the lowest eigenvalue. 

It is also interesting to observe that the convergence of the plaquette to its correct value 
is not exponential in n; in fact it is not necessarily monotonic. For (3 = 5. for example it 
approaches the correct value from above, while the quenched (n = 0) value is below the full 
QCD one. This unexpected behaviour occurs when e is taken much larger then X m in{Q 2 )- 
In such cases 5, taken alone, can not be a good measure of the error: The bulk of the error 
comes from small eigenvalues A < e. 

For k = 0.12, the results show that we can push e up to 0.1, while we see a definite 
breakdown of the approximation at e = 0.5, n = 4 (5 ~ 2. x 10~ 4 ). As we approach 
more critical k values, the eigenvalues of Q 2 decrease and, for k = 0.14, we see that the 
approximation breaks down already at e = 0.08, n = 16, (8 ~ 1. x 10 -4 ). 

For (3 = 5 and k = 0.15, in the confined phase, the minimum eigenvalue is ~ 0.003, and 
satisfactory results can be obtained at e = 0.01, n = 50, (5 ~ 7 x 10~ 5 ). 

Since the CPU costs are expected to be at least proportional to the number of fields n, 
and, in turn, a smaller value of e requires a larger n, we conclude that the most economical 
program will use the highest e and lowest n values compatible with a good approximation 
of the fermion determinant. A more detailed discussion of the CPU costs for obtaining a 
set of uncorrelated configurations is found in the next section. From our observation of the 
plaquette on a 4 4 lattice, it appears that one can raise e up to values significantly larger 
than the lowest Q 2 eigenvalue. 

It will be shown in section 5, that the situation can be further improved by the insertion 
of a global Metropolis test in the updating procedure for the 4> and U fields. The modified 
method will allow us to maintain a good approximation for higher e and therefore lower n, 
thus reducing the CPU cost of the simulation. 
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Table 1: Simulation results for the plaquette on a 4 4 lattice at (3 = 6 for the Liischer 
algorithm and the HMC algorithm. Each plaquette value is an average of about lO'OOO 
measurements. A star * means that the autocorrelation time was too long to be mea- 
sured accurately. The autocorrelation time of the Liischer algorithm is measured in unit of 
iteration sweeps, consisting of one HB and OR of the 4> and one OR of the U. The auto- 
correlation time of the HMC is measured in unit of trajectories, each of them consisting of 
12 molecular dynamic steps. 
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4 Autocorrelation and scaling behaviour for the pure Liischer 
algorithm 



We have measured the autocorrelation of the plaquette in the deconfined and confined 
phases. In Fig. 3, the plaquette autocorrelation time r is shown versus the ratio in 
the deconfined phase: the relationship between the two appears to be linear. On the other 
hand, if we freeze the bosonic fields (ft, the autocorrelation time of the plaquette is 0(1) 
sweep of the gauge fields. So the long correlation times are caused by the slow dynam- 
ics of the (ft fields. These bosons are updated by local algorithms (e.g. heat-bath) which 
have a critical dynamical exponent of 2 in the worst case scenario. This means that, if 
the correlation length of the bosonic action is £ in lattice units, then the correlation time 
will be ~ £ 2 . What then is the correlation length ? The bosonic action is of the form 
\(Q - Hk)<ftk\ 2 + Wk<ftk\ 2 = |(75<9 - 75Wi)<Afc| 2 + Wk<ftk\ 2 - In the deconfined phase the Dirac 
matrix 75 Q is chirally symmetric. But the second term 75 is not: it breaks chiral sym- 
metry, with a characteristic length scale 1/nk- Therefore we expect that the smallest mass 
term [i = mm&(//&) will control the autocorrelation time: r ~ £ 2 ~ —j. It is easy to observe 
from the definition of the roots that the smallest [i~ 2 is proportional to n/^/e. The same 
conclusion r ~ n/y/e can be reached also by considering the correlation length l/z/& given 
by the third term of the bosonic action above [|[. 

In the confined phase, an additional length scale l/m n appears because of chiral sym- 
metry breaking. This length scale may obscure the previous relationship r ~ l/\/e> an d 
weaken the dependence of r on e. Unfortunately our numerical simulations are not accurate 
enough to resolve this issue: they only indicate that r increases very much in the confined 
regime, for a given quark mass and number of bosonic fields. 

While the cost in memory of the program is proportional to the volume of the lattice 
and to the number of auxiliary boson fields, the work (or CPU time) necessary to obtain 
two uncorrelated configurations is proportional to the number of boson fields times the 
autocorrelation time r. 

Assuming the same dependence of r in both phases, which should be a worst 

case scenario, we can predict the slowing down of the algorithm as the quark mass m q 

decreases, e must be tuned according to the lowest eigenvalue of the Q 2 matrix and is 

therefore proportional to m 2 . Requiring the error 5 to be constant one obtains from (Q) 

that n ~ l/m q and therefore, for a fixed volume, the autocorrelation time increases like 

m~ 2 . Since the work per sweep is proportional to the number n ~ l/m q of fields to update, 

one finally obtains that the work to obtain a new, decorrelated configuration, scales like 

_a 
m q . 

As the volume V of the lattice is increased, the scaling behaviour of the algorithm can 
be easily predicted from the exponential convergence of the approximation eq. (||). Let 
us keep the quark mass, and thus e ~ m~ 2 , fixed. Then requiring that the accumulated 
error of the approximation remain constant as V is increased leads to n ~ log V. The 
autocorrelation time is proportional to n, and the work to obtain a new, decorrelated 
configuration proportional to n 2 , so that the work scales like U(logU) 2 . 

To summarize this section, the needed work per new configuration is at most propor- 
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tional to V (logV) 2 m q 3 . This compares favourably with that of the HMC algorithm, 

y 5 / 4 m q 7 ^ 2 or y 5 / 4 m q 13//4 , depending on the scaling of the trajectory length with m q 
@. Either way, the Liischer algorithm is asymptotically faster, with respect to both large 
volumes and small quark masses. 



5 The systematic error and the exact algorithm 

The exact partition function of QCD is given by 

detQ 2 detP(Q 2 ) _ Sg[u] 



[dU] det (Q 2 P{Q 2 )) Z b [U]e- 



Sa[U] 



The partition function used in eq. (||) corresponds to approximating det (Q 2 P(Q 2 )) 
with unity. Hence, for a given configuration of gauge fields U, the error involved in the 
approximation of the determinant is given by 

R[U] = detQ 2 P(Q 2 ) - 1 = J] A * p ( A i) " 1 ( 9 ) 

i 

where the last step follows by simultaneously diagonalising P(Q 2 ) and Q. Aj are the eigen- 
values of Q 2 for the configuration U. 

The actual value of this systematic error changes between two different configurations U 
and U' as R[U] does. The amplitude of these fluctuations can be tuned by an appropriate 
choice of the approximation parameters, the polynomial degree n and the cut-off e. Of 
course, the cost in memory and CPU time will dramatically increase if we are trying to 
improve the approximation. 

A possibility to reduce the error, is to introduce the effect of the term R[U] + 1 in the 
simulation by the insertion of a Metropolis step in the update process, according to the 
following scheme 0: 



We start with a configuration (U, (j)); the operator R[U] + 1 is calculated for the 
configuration U. 

The (p and U fields are updated m times, by performing a <p heat-bath together with 
over-relaxation for the 4> fields and over-relaxation for the U fields. 

The error R[U'] + 1 is calculated for the new gauge configuration U' . 

The new configuration ([/', (j)') is accepted with probability 

R[U'\ + 1" 



Pace = min 



R[U] + 1 
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This combination of heat-bath, over-relaxation and Metropolis update ensures that the 
Lagrangian simulated is the exact QCD one of eq. @. 

The computation of R[U] + 1 is performed by computing the Q 2 eigenvalues Aj and 
using: 

L 

R[U] + l = Y[X l P(X l ) . (10) 
i=l 

This is a computationally intensive step, because in theory the full spectrum of the matrix 
Q 2 has to be determined. In practice, however, when the polynomial is precise enough 
(S(n,e) < 10 -4 ), the dominant contribution to ([To| ) comes from the low-lying eigenvalues 
A < e of Q 2 , and the product (10) can be cut-off at some lower bound L' . With this method 
the CPU cost invested for the evaluation of R[U] becomes irrelevant if the Metropolis tests 
are divided by a sufficient number m of updating steps and the number of eigenvalues A < e 
is not excessive. On a 4 4 lattice, possibly thanks to the low density of the eigenvalues, 
we can choose e even ten times larger than the lowest eigenvalue, reducing even further 
the CPU time. Of course, on bigger lattices, e will have to be chosen closer to the lowest 
eigenvalue. Notice that the work needed for the global Metropolis test is proportional to 
V 2 , which renders this algorithm costly for very large lattices. 

To obtain the low lying spectrum of Q 2 , we used the Lanczos algorithm. Attention must 
be paid to roundoff errors, and we chose the method advocated by Cullum and Willoughby 
to filter out spurious "eigenvalues" ||(see appendix B). 



6 Numerical simulations with the exact algorithm 

Numerical simulations using the exact algorithm described in the previous paragraph have 
been performed on a 4 4 lattice in the deconfined phase, at = 6.0. For the choice of the 
(e, n) parameters we have been guided by our experience with the pure Liischer algorithm. 
A few exploratory results for a 8 4 lattice, at f3 = 5.3, and k = 0.156 and 0.162 (therefore, 
in the confined phase) will be also presented in the next section, and the related problems 
discussed. 

For the 4 4 lattice, we concentrated on k = 0.14, where the e range acceptable for the pure 
Liischer algorithm is more limited. We explored the capabilities of the modified algorithm 
by varying e and n and compared with previous results at the same parameter values. 

The results are listed in Table 2. In the last column of this table, the number of Lanc- 
zos iterations performed is reported. The relationship between the number of Lanczos 
iterations and the actual cut-off L' is linear. The values for the average acceptance of 
the Metropolis step are also shown. Finally, since we chose a value of m, the number of 
(0, U) sweeps, equal to or larger than the autocorrelation time r (i.e: m = 10), the auto- 
correlation of the new results in our new units is always r ~ 1, and is not shown in the table. 

In Fig. 4 the plaquette values are plotted vs. 5(e,n), along with the corresponding re- 
sults from the pure Liischer algorithm. From Table 2 and Fig. 4, it is clear that, when the 
pure Liischer algorithm gives satisfactory results, acceptance for the Metropolis step is very 
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Table 2: Simulation results for the plaquette on a 4 4 lattice at (3 = 6 for the exact algorithm 
(Liischer+Metropolis) and the HMC algorithm. Each plaquette value is an average of about 
lO'OOO measurements. 



high, and no significant differences are observed when using the exact algorithm. Where 
the approximation of the pure Liischer algorithm breaks down (in Table 2, for parameters 
e = 0.08, n = 16), on the other hand, the average acceptance goes down to about 70%, and 
a significant improvement in the plaquette value is observed, meaning that the insertion of 
the R[U] + 1 term actually corrects the error the approximation makes for s < e. This is 
clearly illustrated in Fig. 5, where the plaquette value is shown to improve as more and 
more terms in the product Y\i=i \,P{\) are computed. 

These results show that the exact algorithm can be effectively used to perform com- 
putations at higher values of e or lower values of n than acceptable for the pure Liischer 
algorithm, thus reducing the CPU cost of the simulation. 

As for the scaling behaviour, that of the original version of the algorithm is V(logV) 2 . 
The version of the algorithm with the global Metropolis test introduces additional work for 
the evaluation of the low spectrum of the fermion matrix. Its scaling behaviour is more 
critical: aV(logV) 2 + bV 2 , where a 3> b are prefactors. 

7 Numerical simulation on the 8 4 lattice 

We finally present here a few results obtained on an 8 4 lattice with the aim of exploring 
the behaviour of the pure Liischer and exact algorithm in a situation nearer to the physical 
one, i.e. on a larger lattice in the confined phase, and with relatively light quarks. The data 
taken are not exhaustive, and we only intend to sketch the possible problems inherent to 
simulations with such parameters, and show the indications coming from a few numerical 
results. 

The main problem we must be ready to encounter has already been mentioned: on an 
8 4 lattice, the density of the Q 2 eigenvalues is higher than on a 4 4 one. This means, on the 
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Table 3: Simulation results for the plaquette on a 8 4 lattice at f3 = 5.3 for the 
Liischer+Metropolis algorithm and the HMC algorithm. Each plaquette value is an av- 
erage of about 4'000 measurements. Here W(I, J) represents the I x J Wilson loop, and 
X(2, 2) is the (2, 2) Creutz ratio. 



one hand, that we will have to use an e value much closer to the minimum eigenvalue than 
in the 4 4 case, and on the other hand that, when the global Metropolis test is introduced, 
a very large number of Lanczos iterations will be necessary to determine all eigenvalues up 
to e. 

Let us now look, in Table 3 and Fig. 6, at the results of the simulations performed at 
j3 = 5.3, k = 0.156 with e = 0.007, n = 80 (6 = 3 x 10" 6 , lowest eigenvalue ~ 0.001), 
and k = 0.162 with e = 0.01, n = 60 (5 = 1 x 10~ 6 , lowest eigenvalue ~ 0.0005), both 
without and with the global Metropolis test. We chose the j3 and k values such as to re- 
produce HMC results by Gupta et al. ||, which are also reported in Table 3, for comparison. 

Both the (e, n) range explored and the statistics are too poor to allow us to draw 
definitive conclusions; nevertheless, we can derive from these data some indications. For 
K = 0.156, where we used e = 0.007, with a lowest eigenvalue ~ 0.001, we can see that the 
pure Liischer algorithm gives quite good results, and the global Metropolis test has little 
influence (corresponding to an acceptance rate of 88%). 

When we go to k = 0.162, with a lowest eigenvalue ~ 0.0005, and simulate at e = 0.01 
(20 times larger), the Liischer algorithm result differs markedly from the HMC one; we now 
introduce the global Metropolis test, and we see that it brings the result to agreement with 
HMC, within the statistical error. 

One apparently puzzling observation is that all plaquette and Wilson loop values con- 
verge to the HMC ones from above, whereas, since the quenched results are lower than the 
full QCD ones, one might have expected a monotonic convergence from below. To investi- 
gate this phenomenon, we combine the Wilson loop results to form the (2, 2) Creutz ratio 
X(2,2) 

^(2,2)1^(1,1) 

X (2,2) = -log , (11) 
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which has a physical significance, since it approximates (era) 2 , the squared string tension 
in lattice units. From Table 3 and Fig. 7 we see that this quantity converges to the correct 
limit from below, whereas the quenched value is above the full QCD one. Thus monotonic 
convergence is not guaranteed, even for physical observables. This will make extrapolation 
to the exact results a rather delicate matter. 

From the analysis of these data, we have confirmation of previous results on the accuracy 
of the approximation and also of the expected CPU requirements at larger volume. 

8 Conclusion 

We have presented first results of full QCD simulations using Liischer's method. Measure- 
ments of the plaquette on a 4 4 lattice show that the approximation remains quite accurate 
for that observable, even when the cutoff parameter e is increased far above the lowest 
eigenvalue \ m in{Q 2 )- They also support our analysis of the complexity of the algorithm: 
V(logU) 2 with the volume V of the system, m~ 3 or better with the quark mass m q . This 

compares favorably with Hybrid Monte Carlo (U 5//4 and m q 13//4 ). 

Larger Wilson loop measurements on an 8 4 lattice reveal the systematic errors of the 
method. Unlike most approximations to dynamical quarks, it turns out that the approach 
of Wilson loops and Creutz ratios to their correct values is not monotonic as the approxi- 
mation improves. 

We have tested a variant of Liischer's algorithm, which includes a Metropolis step to 
correct for the breakdown of the approximation for small eigenvalues. This extra step brings 
Wilson loops to their correct values within errors, for an overhead which is small on our 
lattice sizes, but grows like V 2 . Further variants are being explored, in particular an exact 
non-hermitian one proposed in ||. 
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Appendix 



A Implementation 

The program is written in FORTRAN 77 for the Cray-T3D using communication primitives 
of the Cray. Its structure is organised in four levels and a user interface facilitates multiuser 
manipulations. From the bottom up, the first level controls communication, navigation (see 
below) and memory allocation between the processors (PE's). The second level constructs 
all objects needed in the simulations (for example staples, plaquettes, Q and Q 2 vector 
multiplications,...). The third level contains all the algorithms used for the simulation. The 
fourth level is the envelope which controls the whole program. A common user interface to 
facilitate compilation and input /output is defined. 

The lattice A is first divided into hypercubes of side 2 (containing 2 4 points each), then 
adjacent hypercubes are grouped into npQequal sets A, C A (i = 0, np— 1), each of which 
will be assigned to one PE. For the navigation in the lattice a mapping Navig : A — > Aj 
between the physical coordinates on the lattice and their allocation as local sublattice co- 
ordinates is stored as a table on each PE. For a given lattice coordinate x G A the mapping 
Navig returns the corresponding local coordinate X{ G Aj and the identification number i 
of the PE which owns X{. The memory allocation of gauge and bosons fields is organised 
according to this mapping. The communication of data is also controlled by this mapping. 

All objects are constructed in parallel by the PE's which locally operate on their sublat- 
tices. The construction of the objects needs some communication of data between PE's. The 
communication is performed asynchronously using the shmem_get/ ' shmemjput routines of 
the Cray-T3D. Only at the algorithmic level the needed synchronisations are defined. 

All algorithms used (heat-bath, over-relaxation, global Metropolis, Lanczos) are con- 
structed using as input the objects. The heat-bath and over-relaxation algorithms for the 
boson fields are specifically written for the Liischer local bosonic action. All other algo- 
rithms are general and can be also used for quenched simulations by simply switching some 
libraries at level 2. 



B Lanczos process 

The Lanczos algorithm is a powerful technique used for the determination of the spectrum 
of sparse, symmetric matrices J/]]. During the iterations of the Lanczos algorithm a m x m 
tridiagonal symmetric matrix is obtained by transforming a n x n matrix A (with m <C 
n) with a matrix Q whose columns are called Lanczos vectors. The extremal eigenvalues of 
are estimates of the extremal eigenvalues of A. 

1 Here np represents the number of used PE's. On the Cray-T3D only np equal to a power of 2 is allowed. 
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In exact arithmetic, the Lanczos vectors are orthogonal. However, in finite precision 
arithmetic, they lose mutual orthogonality as the number of iteration steps m increases. As 
a consequence eigenvalues re-appear, i.e. the algorithm finds "spurious" or "ghost" values 
which are not actual eigenvalues of A, but which are nearly degenerate with them. 

The problem can be solved by explicitly orthogonalizing the Lanczos vectors. However, 
all the Lanczos vectors must then be stored and the memory cost of the algorithm is enor- 
mous. 

Instead we use another clever way of identifying spurious eigenvalues without orthogo- 
nalization, proposed by Cullum and Willoughby [||]. In their algorithm one the eigenvalues 
of T( m ) are compared with the eigenvalues of a matrix T2 which is obtained from by 
deleting its first row and column. If a simple eigenvalue of T^" 1 ) is also a simple eigenvalue 
of T2, then this eigenvalue is spurious. 
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Figure caption 

1. Lower spectrum of the Q 2 matrix at (3 = 6 and k = 0.12 on a 4 4 lattice. 

2. Plaquette versus 5 at (3 = 6, (a) k = 0.12 and (b) k = 0.14 on a 4 4 lattice with the 
pure Liischer algorithm. The lines indicate the HMC values, with their errors. 

3. Autocorrelation time of the plaquette versus n/y/e for a 4 4 lattice, at (3 = 6., (a) 
k = 0.12 and (b) k = 0.14. The lines represent linear fits to the data. 

4. Plaquette versus 5 at (3 = 6, k = 0.14 on a 4 4 lattice with the use of the global 
Metropolis test. The lines indicate the HMC values, with their errors. 

5. Plaquette versus number of Lanczos iterations used in the global Metropolis test, at 
(3 = 6, k = 0.14 on a 4 4 lattice, with parameters e = 0.08 and n =16. The lines 
indicate the HMC value, with its error. 

6. Plaquette (a) and Wilson loop WL(1, 2) (b) data from the simulation on the 8 4 lattice 
at /3 = 5.3 with k = 0.156 (O) and k = 0.162 (□). The lines indicate the HMC values, 
with their errors. 

7. Creutz ratio %(2, 2) versus number of Lanczos iterations used in the global Metropolis 
test, on the 8 4 lattice, [3 = 5.3, k = 0.162, e = 0.01 and n = 60. The lines indicate 
the HMC value, with its error. 
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